Kestrel-3

I/O Architecture
Login

Change Log

Abstract

With magnetic and solid-state storage devices, parallel and serial interfaces, constantly evolving protocols, and the desire to support as many system environments as I reasonably can, it would take an extensive amount of effort to support each (environment * device) combination. Even for small numbers of each, I lack the bandwidth to do so; I work a full-time job, work out in a gym, socialize with friends on the weekends, and I wish to spend as much time with my family as I can. Therefore, what I need is a mass storage solution that I can understand and implement, such that my effort is maximally leveraged. By reducing the (environment * device) workload to an (environment + device) workload, I can free up much of my time to focus on things which are more important to me.

Introduction

After completing the Kestrel-2DX project, which represents my second time working with SD card technology for storage, I vowed that I would never work directly with the SD/MMC card protocol again. I found it to be too fickle, card support for established standards woefully lacking (including cards manufactured by the standards' authors!), and an all-around huge time-sink in debugging. I was not happy that it literally took me as long to get basic SD cards working at all as I'd spent writing the Kestrel-2DX's Forth implementation from scratch.

Imagine, for a moment, how much additional time it would take me if I were to port SD card support to the Tripos operating system. All that effort getting Forth working would be a sunk cost, as Tripos works in a fundamentally different way than Forth does. I'd have to re-invest all that time debugging and testing all over again.

I should point out that I never succeeded in getting SDHC or SDXC working, despite their similarity to baseline SD protocol. I see no point in even attempting any of the newer standards that have come out since.

Imagine another scenario, which I'm positive is only a matter of time before it happens. In this future world, I now have a (perhaps legacy) harddrive that I want to support. DX-Forth and Tripos already exist as operating systems, and perhaps someone is busy porting Plan 9 or NetBSD to the Kestrel-3. Clearly, after investing the work needed to make the harddrive work in DX-Forth, I'd prefer to be able to use this harddrive with minimal to no changes in Tripos, NetBSD, et. al. At present, however, this isn't feasible. I'd need to write a driver stack for DX-Forth, a driver stack for Tripos, a driver stack for NetBSD, and so forth. The reason should be fairly clear: the harddrive will speak a fundamentally different protocol than the SD card storage system.

Instead of focusing on the Herculean task of maintaining platform-specific drivers and technology-specific interfaces, I need a holistic I/O architecture that let me leverage not only software I've already written before, but also hardware I've already designed. Filesystems notwithstanding, it should be possible to use a completely new block device with Forth, Tripos, and some future port of NetBSD and Plan 9 without the device engineer (most likely me) needing to worry about driver support.

This document is an abridged version of another which discloses the various designs evaluated up to the current ("Design 5"). This design skips all the historical material, and includes only the technical details of the current design.

Design 5: Final Refinement

As with all things software and hardware not yet built, "final" simply means that this is the design I feel is least burdensome to a new hardware developer. Obviously, the proof will be in the pudding; things will almost certainly change in the future as I run into problems and discover nuances in the solution space.

Physical Configuration

The Kestrel computer would need a minimum of one I/O channel. More than one I/O channel is permitted, and probably preferable to a switched configuration (switches will be discussed later). I/O channels can be of many types; the type described in this document is based on classic UART technology.

To function with the 9P server in the device, an I/O channel must provide the following services:

  1. It must be full-duplex or provide a suitable emulation of full-duplex operation.
  2. It must be a point-to-point interface in both directions (computer to device, and device to computer); or, offer a suitable emulation of a bidirectional, point-to-point interface.
  3. It must transport bytes from one side of the link to the other.
  4. It must be balanced; either side of the link may initiate a transfer at any time.
  5. The bytes transferred must arrive in the order they were sent.

While designing this I/O architecture, I had envisioned two different versions of the I/O channel architecture. A slower speed channel suitable for hobby and slower-speed device development, and a higher speed channel suitable for more mature implementations of the Kestrel Computer family.

It should be observed that the physical interface definitions outlined here are not intended to limit one's imagination. Future physical standards are not only possible, but likely. However, as long as the previously listed characteristics are adhered to, equipment to adapt one interconnect standard to another can be made inexpensively, thus enhancing compatibility between otherwise competing technologies. For example, an inexpensive FPGA with soft-core microcontroller on-board can translate from 115kbps asynchronous serial to 25Mbps synchronous serial without involving a 9P implementation. As I write this, such a component can be built for under US$15.

115kbps Asynchronous 3.3V Serial Interconnects

This slower speed interface is intended to address the needs of evolving the Kestrel-3 from concept stage to a working model. It capitalizes on the observation that slower, proven technologies can often amplify the effects of inefficiencies and the presence of bugs. By reusing commercially available cabling and widely available parts, it is also the cheapest way to build a new Kestrel-compatible peripheral.

These interfaces rely on 1x6 PMOD Type 4 connectors. The cables used to connect computers and devices are straight-through 1x6 PMOD cables. This configuration keeps the circuit design simple and affordable, allowing both off-the-shelf FPGA as well as simple microcontroller devices to be attached with commonly available parts at the lowest cost.

Burst transfer speed of this interconnect is, nominally, 11.52 kilobytes per second.

The computer's PMOD connector will have the following pin-out:

Pin Name Driver Purpose
1 CTS Device Unused.
2 TXD Computer Data stream to the device.
3 RXD Device Data stream from the device.
4 RTS Computer Unused; keep 0V.
5 GND Computer 0V reference.
6 VCC Computer +3.3V reference.

Note that RTS and CTS signals are explicitly not used in this specification, and are tied to 0V. These pins are to be considered reserved for future redefinition.

Thanks to the prevalence of straight-through PMOD connector cables versus cross-over cables, devices have a swapped set of signal interpretations on their PMOD interfaces:

Pin Name Driver Purpose
1 CTS Device Unused; keep 0V.
2 RXD Computer Data stream to the device.
3 TXD Device Data stream from the device.
4 RTS Computer Unused.
5 GND Computer 0V reference.
6 VCC Computer +3.3V reference.

To help distinguish one port interpretation from another, the computer's port is referred to as an upstream port, while a device's port is referred to as a downstream port. As you might expect, a switch must follow a similar convention, whereby all of its device attachment points are wired to be upstream ports.

Although links may conceivably operate at any supported EIA-232 transfer rate, initial operation must start at 115,200 bits per second. The protocol for altering the data rate between the upstream and downstream port is not defined.

The serial interface is required to support and use the following parameters:

The data bits are transmitted least significant bit first, according to normal EIA-232 operation.

25Mbps Synchronous 3.3V Serial Interconnects

This higher speed interface is intended for mature implementations of the Kestrel-3 and related computers. It builds upon the asynchronous serial interconnect by adding clock forwarding to support higher data transfer rates. I envision this class of interconnect to be DMA-driven, perhaps with semi-intelligent I/O off-load processing capabilities. In hardware terms, I anticipate this interconnect to host primarily FPGA-based designs.

These interfaces also rely on 1x6 PMOD Type 4 connectors, but recycle the RTS and CTS signals for clock signals. The cables used to connect computers and devices are straight-through 1x6 PMOD cables. This configuration retains the simple and affordable circuit design.

The burst transfer speed of this interconnect is, nominally, 2.5 million bytes per second.

The computer's PMOD connector will have the following pin-out:

Pin Name Driver Purpose
1 RXC Device Clock for RXD.
2 TXD Computer Data stream to the device.
3 RXD Device Data stream from the device.
4 TXC Computer Clock for TXD.
5 GND Computer 0V reference.
6 VCC Computer +3.3V reference.

Thanks to the prevalence of straight-through PMOD connector cables versus cross-over cables, devices have a swapped set of signal interpretations on their PMOD interfaces:

Pin Name Driver Purpose
1 TXC Device Clock for TXD.
2 RXD Computer Data stream to the device.
3 TXD Device Data stream from the device.
4 RXC Computer Clock for RXD.
5 GND Computer 0V reference.
6 VCC Computer +3.3V reference.

To help distinguish one port interpretation from another, the computer's port is referred to as an upstream port, while a device's port is referred to as a downstream port. As you might expect, a switch must follow a similar convention, whereby all of its device attachment points are wired to be upstream ports.

This link must start running at 25Mbps. No protocol exists for altering the data rate between the upstream and downstream ports.

The serial interface is required to support and use the following parameters:

The data bits are transmitted least significant bit first, according to normal EIA-232 operation.

Physical Layer Receiver State Machine

The following state machine describes the behavior of the receiver. This state machine can be implemented in hardware or in software (e.g., as part of the device driver stack). When the device link starts up, the physical layer receivers on each peer starts in the RcvHuntByte state. Note the use of the NUL byte as a frame delimiter and synchronization boundary, as appropriate for COBS encoding.

State Name Predicate Actions Next
PR0 RcvHuntByte 1 Byte received is not $00 Ignore PR0
2 Byte received is $00 Ignore PR1
------- ------------- --------------------------------------------------------- ------------------------------- ------
PR1 RcvDataByte 1 Byte recv'd is not $00 and buffer space available Save byte PR1
2 Byte recv'd is not $00 and buffer space not available Ignore PR0
3 Byte recv'd is $00 Dispatch frame to link layer. PR1

Physical Layer Transmitter State Machine

The following state machine describes the behavior of the transmitter. This state machine can be implemented in hardware or in software. When the device link starts up, the transmitter on each peer starts in SndWait state. Note that the transmitter assumes the data to be sent is already encoded with COBS encapsulation.

State Name Predicates Actions Next
PT0 SndWait 1 No frame to send Wait for a frame to send. PT0
2 Frame ready to send Send first byte of frame. PT1
------- ------------ --------------------------------------------------- ----------------------------------------- ------
PT1 SndSending 1 Byte not finished sending Wait for current byte to finish sending PT1
2 Byte finished sending and more bytes to send Send next byte of frame. PT1
3 Byte finished sending and no more bytes to send Signal ready for next frame. PT0

Data Link Layer

One thing that is for certain, we need a data link layer. As illustrated above, we could just pass 9P messages directly back and forth over the serial links, but this is frought with uncertainty. We could lose a critical byte, or experience single-bit errors. Note that 9P size fields are 32-bits wide; imagine a flip of bit 31 causing a device to become unresponsive as it attempts to handle a 2GB 9P frame at 115.2kbps.

Frame Types

I currently define four types of frames:

  1. D-DATA. Data frames carry the 9P protocol stream, which indirectly means it's how you talk to devices.
  2. D-DATA-ACK Acknowledge frames are used to prevent the 9P server from ever seeing erroneous data, and to ensure this data arrives in the order intended.
  3. D-RESET. Reset frames are used to synchronize the data link layers of the sender and the receiver.
  4. D-RESET-ACK. Reset acknowledgements are used to complete the synchronization process set off by an L-RESET-ACK frame.

All frames take the following form prior to frame encapsulation:

ctl (... optional data ...) fcs[2]

The control field ctl consists of two sub-fields:

cccc ....  4-bit Sequence Counter
.... tttt  4-bit Frame Type

A sender may send up to 15 frames before it must wait for acknowledgements to arrive; however, I advise against this. The most any transmitter should send is two frames at a time, in an attempt to exploit the natural link pipelining a serial interface provides. This would allow, for example, the transmitter to be sending frame F+1 while receiving and processing an acknowledgement for frame F at the same time. The only time sending more than two is of any value is on half-duplex links, which are not specified for this application.

The frame type field currently has four values defined; the remainder are reserved for future consideration.

Frame Type Interpretation
0 D-RESET
1 D-RESET-ACK
2 D-DATA
3 D-DATA-ACK
4..15 reserved

Frames of unknown types must be treated as corrupt frames, and simply ignored by the receiver.

D-DATA Frames

A data frame may contain up to 128 bytes of payload data. Payloads are restricted to 128 bytes maximum, which when fully utilized and frame encapsulated, introduces only 3.75% overhead. Except for bulk data transfers, this overhead isn't generally a concern.

For bulk transfers of data, link efficiency is about 96.25% efficient. Thus, for a 115.2kbps link, we can reasonably expect to see 11088 bytes per second throughput.

I chose 128 bytes because it's trivially simple to predict maximum buffer size for that amount of data worst-case after frame encapsulation. It is 133 bytes. 128/133 = 96.25 % efficiency; ergo, about 3.75% overhead. It also seems quite approachable for even relatively modest MCUs. This could help keep costs quite low for devices that don't otherwise need the resources.

Devices must be built to support at least one 133 byte data link buffer.

D-DATA-ACK Frames

After receiving an D-DATA frame and confirming it is correct and valid, the device or computer should respond with an D-DATA-ACK frame at its earliest convenience. The acknowledge frame has its sequence counter set to the most recently received D-DATA frame's sequence counter.

These frames do not carry information.

D-RESET Frames

The sequence counters in data packets are, in effect, global state. As with all global variables, they must be initialized prior to use. Before a computer can reliably talk to an attached device, or before the device responds to the computer, both need to agree on what the next sequence number will be.

A device or computer which receives an D-RESET frame must reset the sequence counter of the next D-DATA frame it sends to that specified in the D-RESET frame.

These frames do not carry information.

D-RESET-ACK frames

After performing a link reset, a device or computer must respond with an D-RESET-ACK frame. The sequence counter of this frame indicates the sequence counter it expects the D-RESET sender to use for its next L-DATA frame.

These frames do not carry information.

Data Link State Machines

The state machine descriptions in this section are not normative, and only serve to illustrate one possible implementation. However, if implemented as-is, you should end up with an implementation that can reliably exchange data between a computer and a device.

A real implementation of the data link layer involves both a transmitter and a receiver. These components are logically separate; however, they must communicate with each other to coordinate expectations. For example, the transmitter must tell the receiver whether or not it anticipates receiving a data or reset acknowledgement frame.

The following shared state is anticipated in most implementations.

Name Type Purpose
DATA-ACK-EXPECTED Boolean A flag which, if true, tells the receiver that a D-DATA-ACK frame is expected.
RESET-ACK-EXPECTED Boolean A flag which, if true, tells the receiver that a D-RESET-ACK frame is expected.
LINKQ Dequeue of frames A queue of all frames except D-DATA frames.
DATAQ Dequeue of frames A queue of only D-DATA frames.
UNACKED List of frames A list of unacknowledged frames (both data and link types).
RSEQ 4-bit unsigned integer The sequence number we expect the next received D-DATA frame to have.
TSEQ 4-bit unsigned integer The sequence number of the next D-DATA frame to be transmitted.
9P-BUF Array of bytes The input buffer used by the 9P server receive loop.
T1 Timer Timer which, upon expiring, causes unacknowledged frames to be retransmitted.

Receiver State Machine

The data link receiver state machine appears below. Upon bringin up a device link, the data link receiver will start in the FrmWait state. When valid data is received, it is extracted directly into the 9P server's command input buffer. The 9P server is further notified of new data via an up-call, an interrupt, or other suitable notification method.

State Name Predicate Actions Next
DR0 FrmWait 1 Frame not available Wait for physical layer to deliver a frame. DR0
2 Frame available (see state PR1). Decode COBS content. DR1
------- ------------ ---------------------------------------------------------------- ------------------------------------------------------- ------
DR1 FrmDecoded 1 Length < 3 Drop frame. DR0
2 Len >= 3, FCS not OK Drop frame. DR0
3 Len >= 3, FCS OK, unknown type Drop frame. DR0
4 Len >= 3, FCS OK, D-DATA type, seq != RSEQ Drop frame. DR0
5 Len >= 3, FCS OK, D-DATA type, seq OK, no room in 9P-BUF Drop frame. DR0
6 Len >= 3, FCS OK, D-DATA type, seq OK, room in 9P-BUF Deposit contents in 9P-BUF;
notify 9P server of new data;
increment RSEQ;
queue D-DATA-ACK frame. DR0
7 Len >= 3, FCS OK, D-DATA-ACK type, not DATA-ACK-EXPECTED Drop frame. DR0
8 Len >= 3, FCS OK, D-DATA-ACK type, DATA-ACK-EXPECTED Recycle covered frame buffers. DR2
9 Len >= 3, FCS OK, D-RESET type Clear output queue; clear unacknowledged frames list;
reset expected sequence number;
queue D-RESET-ACK response. DR0
10 Len >= 3, FCS OK, D-RESET-ACK type, not RESET-ACK-EXPECTED Drop frame. DR0
11 Len >= 3, FCS OK, D-RESET-ACK type, RESET-ACK-EXPECTED Reset expected sequence number;
stop expecting reset acknowledgements. DR0
------- ------------ ---------------------------------------------------------------- ------------------------------------------------------- ------
DR2 FrmAcked 1 UNACKED empty Cancel T1; reset DATA-ACK-EXPECTED. DR0
2 UNACKED not empty Restart T1. DR0

When in the FrmWait state, the data link will wait for delivery of a COBS frame from the physical layer implementation. After decoding, the data link enters FrmDecoded state, where it will try to figure out what to do with the received frame. Once it's done processing the frame, the data link returns to FrmWait state, where it will wait for another frame to arrive (if it hasn't arrived already).

Transmitter State Machine

The data link transmitter state machine appears below. When the link is brought up for the first time, state FrmReset is the initial state. While only one side needs to issue a D-RESET frame to fully initialize the link, implementations must be prepared to handle the case where both sides attempt to reset the link at the same time.

State Name Predicates Actions Next
DT0 FrmReset Clear all queues and lists;
expect a reset acknowledgement;
enqueue a D-RESET frame. DT1
------- ------------- ------------------------------------------- ---------------------------------------------------------- ------
DT1 FrmWaitTx 1 LINKQ empty and DATAQ empty Wait for something to send. DT1
2 T1 expired, UNACKED not empty Move frames from UNACKED back onto their queues. DT1
3 LINKQ not empty Start sending head of LINKQ (see state PT0). DT2
4 LINKQ empty, DATAQ not empty Start sending head of data queue (see state PT0). DT3
------- ------------- ------------------------------------------- ---------------------------------------------------------- ------
DT2 FrmWaitLink 1 Current frame not yet finished sending Wait for frame to be sent. DT2
2 Current D-RESET frame finished sending Start T1; set RESET-ACK-EXPECTED; move frame to UNACKED. DT1
3 Current D-*-ACK frame finished sending Recycle buffer. DT1
------- ------------- ------------------------------------------- ---------------------------------------------------------- ------
DT3 FrmWaitData 1 Current frame not yet finished sending Wait for frame to be sent. DT3
2 Current frame finished sending Start T1; set DATA-ACK-EXPECTED; move frame to UNACKED. DT1

The 9P server is expected to send frames via some service interface which COBS-encapsulates data prior to putting frames onto the LINKQ or DATAQ queues. The service interface is required to segment large responses into 128-byte payloads as required by the data link.

9P Layer

In general, the 9P server is too complex to describe in detail in this specification. However, what I can describe are some ideas I've been thinking about regarding how to identify different kinds of attached peripherals without having to invest a lot of effort in deep filesystem introspection. The files discussed in this section are to be taken as recommendations, not requirements. Finally, these are just ideas, and are subject to change at any time based on experience and feedback received.

/compat

This file's purpose is to identify compatible device drivers to bind to the device. Its format is loosely inspired by the Device Tree Specification's compatible property, as well as Microsoft Component Object Model's use of a "GUID" to separate the identity of a component class (CLSID) from a specific software implementor. Device drivers are selected in the following order of preference:

  1. Vendor-specific driver.
  2. Vendor-agnostic but model-compatible driver.

If no automated means of selecting a driver is available, that's also acceptable; however, it will be the responsibility of the operator to select and activate appropriate driver software for the device.

It records the following information in the format described below:

human-string[s] ncompat[2] ncompat*compat

The human-string field should contain a human-readable identification of the equipment. For example, if I am the maker of the device, and the model is an SD/MMC card reader, then the human-string field might read something like, "SD/MMC Slot, by Samuel A. Falvo II". If another maker decides to clone my implementation for commercial production, then the string must be updated accordingly to their product naming conventions. For example, "ExampleCorp ExampleMedia 1000 SD/MMC Reader". The string should not contain any line endings or NUL termination.

The ncompat field indicates how many "compatibility" records exist, possibly even 0. Each compat compatibility record consists of a single UUID:

class[16]

Each class UUID identifies a class of functionality that the peripheral supports via the 9P interface it provides. For example, it would not be practical to list separate UUIDs for 360KB, 720KB, 1.44MB, and 2.88MB floppy disk formats supported by PC floppy drives (much less the 400KB, 440KB, 800KB, 880KB, and 1760KB formats offered by Mac and Amiga floppy drives!); these kinds of device capabilities are best inquired using more appropriate facilities offered through the 9P filesystem interface. However, it is appropriate to list one UUID indicating your unique make and model and another UUID indicating that that it's compatible with generic "fixed-block-allocated, direct access storage" devices.

The very first class should uniquely identify your hardware make and model. Subsequent class records should be listed in the order of decreasing specificity of compatibility to the peripheral. If an operating system provides a class to driver mapping database, this would enable the operating system to try locating the most hardware-specific device driver first, followed by a driver not quite as specific (perhaps by the same manufacturer but not specifically tailored for your device), and so forth until eventually a completely generic driver is sought.

UUIDs are treated as opaque, 128-bit, little-endian numbers. Thus, if you generate a UUID as follows (assuming a Linux computer):

$ uuidgen
40e22855-9eeb-467d-89f1-5bbb2151f8dc

then the UUID would appear in the file as follows:

DC F8 51 21 BB 5B F1 89 7D 46 EB 9E 55 28 E2 40

Specific UUIDs for specific features are not specified in this document.

A word on the use of UUIDs instead of Device Tree-like "make,model" strings, if I may. I opted to use UUIDs instead of strings for several reason:

  1. They do not require a centralized authority to act as a registry for manufacturer IDs.
  2. Most operating systems today either come with or provide easily installed packages containing tools which generate them.
  3. You don't have to think of clever names for your company and/or models. You'll eventually want these anyway, but can be decided upon when it's time to market your wares. You don't have to worry about picking a good name during development. Similarly, you don't need to worry about altering the UUIDs after your done with development and ready to market the device.
  4. They occupy a fixed amount of space in memory, and are easy for both high-level and low-level programming languages to use.

There are, of course, some deficiencies to relying on UUIDs.

  1. You need a look-up table or other kind of database which maps UUIDs to human-readable strings for makes and models if you wish to identify devices to a human operator.

This is, as I write this document, the only deficiency I can think of. As you can see, the merits of using UUIDs seem to outweigh the use of human-readable strings. However, as I gain experience and feedback from other contributors on this matter, I wish to give notice now that my preference for UUIDs may change in the future. For now, however, this seems to be the right way to go.

/interfaces

This file serves a similar role to /compat above; however, its focus is a bit different. It focuses exclusively on listing individual capabilities of the device, and makes no attempt at classifying sets of capabilities as /compat tries to do.

The format for this file is similar to /compat:

nifs[2] nifs*interface[16]

The nifs field identifies how many supported interfaces exist. This should always be at least 1. Each interface supported by the device is described by one interface.

For example, a RAID controller can expose one of several types of interfaces:

  1. It itself may be a block storage device (if it exposes a single RAID array volume).
  2. It can act like a switch to individual disk drives making up the RAID array.
  3. It can act like a switch to individually configured RAID volumes.

So, array vendor 1 might create an /interfaces file with three UUIDs in it (in any order):

However, array vendor 2 might take a different approach, with the following interfaces instead:

/interfaces vs /compat?

It's not clear to me which approach is the superior approach. Experience with COM suggests that /interfaces is more flexible and can support a wider array of implementations. However, it's also the case that the sets of interfaces tend to be batched together, which can make supporting classes of devices somewhat easier. At this early juncture, only actual experience can provide the feedback necessary to decide which approach to take (or both!) is appropriate.

/swver

This file contains a textual string identifying the version of the software running on the equipment. It should be treated opaquely, and is intended explicitly for human consumption.

For example, 1.2.0 or 2019.4rc2. It does not have any carriage-return or line-feed endings, nor NUL-termination.